Project Motivation
Background on Dialysis and Proposition 29
Dialysis is a lifesaving treatment that removes waste from blood—acting as an artificial kidney for those with chronic kidney disease. Prop. 29, which failed to pass, was the third attempt to increase regulations on dialysis clinics in the state—preceded by Prop. 8 in 2018 and Prop. 23 in 2020. Proposition 29—Dialysis Clinic Requirements Initiative—aimed to establish increased regulations for both staffing and operations for the roughly 600 dialysis clinics in California, an estimated $3.5 billion industry. Notably, the proposed regulation required the presence of a physician or licensed practitioner during all treatment hours, increasing each clinic costs by several hundred thousand annually.
This project uses a novel approach to study an area of public interest relevant not just to California but the entire country. Investigative reporters, patient advocacy groups, and labor organizations have spent significant resources over the past decade to raise public awareness of the dialysis industry and its need for regulation.
Debate Surrounding Dialysis Regulations
- Proponents argue that increased regulations improve patient safety and quality of care.
- Opponents contend that the increase in healthcare costs is unwarranted and would limit care coverage by overwhelming facilities with costs and potentially forcing them to close.
Project Scope and Significance
Our project explores trends in dialysis clinic access, quality of care, and ballot results in the state of California in recent years. Using publicly available data from the Center for Medicare and Medicaid Services and aggregated election results from California’s Secretary of State, we’ve analyze associations between dialysis care and voting behaviors, specifically those related to recent statewide ballot initiatives designed to regulate California’s multibillion-dollar dialysis industry—including 2022’s Proposition 29, which failed to pass by a large margin. While private health insurers typically pay higher rates, medicare pays for dialysis treatment for the majority of people on dialysis in California (source) meaning most patient interactions with dialysis clinics would exist within the Medicare and Medicaid datasets.
Relevance and Novelty
- This project uses a novel approach to study an area of public interest relevant not just to California but the entire country.
- To our knowledge, this is the first project of its kind to explore the possible association between voting patterns and the quality of dialysis care received.
- The majority of dialysis treatments in California are covered by Medicare, making the Medicare and Medicaid datasets particularly relevant to our analysis.
Broader Context
Investigative reporters, patient advocacy groups, and labor organizations have spent significant resources over the past decade to raise public awareness of the dialysis industry and its need for regulation. Our project contributes to this ongoing discussion by providing data-driven insights into the relationship between dialysis care quality and voting behavior.
Project Research Questions
Our research questions are divided into two categories: primary and secondary. The primary question serves as the main focus of our analysis, while secondary questions provide additional insights through further investigation of the data.
Primary Research Question
Is the quality of care of dialysis facilities correlated with voting in favor of or against dialysis industry regulation?
Key Assumptions
- The relationship between Quality of Care and Voting Behavior is not confounded.
- A vote in favor of any of the three propositions (Prop 8 in 2018, Prop 23 in 2020, Prop 29 in 2022) can be interpreted as support for dialysis industry regulation.
Quality of Care Metrics
To test this relationship under the outlined assumptions, we approximate the quality of care using the following metrics:
- Five-star rating
- Patient experience rating
- Facility mortality rate
- Number of available dialysis stations
- Staff rating
- Hospital readmission categorization (Worse than Expected, As Expected, Better than Expected)
- Profit/non-profit designation
- Parent company affiliation/independence
Facility Categorization
Observations of Quality of Care metrics in our data are categorized by:
- Year
- County
- City
- Profit/Non-profit designation
- Parent company affiliation/independence
Secondary Research Questions
- What is the geographic coverage of dialysis facilities in California?
- Is there any correlation between organizational structure (chain-owned, profit vs. non-profit) and the quality of care?
- Is there any association between the parent company of dialysis facilities and the quality of care?
Data Sources
Primary Data Sources
CMS Quarterly Dialysis Facility Compare Dataset
Key features: - Star ratings for facilities - Patient experience metrics - Quality of care metrics
Insights provided: - Patient satisfaction - Clinical outcomes - Doctor-patient communication - Hospitalization rates - Treatment effectiveness
Rating calculation: - Patient experience: bi-annual surveys - Facility ratings based on: - Unplanned hospital readmissions - Total and expected transfusions - Ratio of deaths to expected deaths - Waste removal efficiency
CA Secretary of State’s Statement of Vote
Elections: November 2022, 2020, and 2018 Focus: Propositions on dialysis clinic requirements Geographic levels: - Counties - Sub-counties: - Congressional districts - State senate districts - State assembly districts - Cities
Secondary Data Source
CA Health and Human Services Specialty Care Clinic Data
Purpose: Supplement CMS dataset with geographic data Additional features: - Senate district - Congressional district - Latitude and longitude
Data Integration and Analysis Potential
- Multiple geographic levels for varied scale analysis
- Clinical (CMS) and voting (SOS) data combination enables correlation exploration
- Enhanced spatial analysis with supplementary geographic data
Data Manipulation Methods
Our workflow was broken down into five stages:
- Data Collection
- Data Preparation
- Database Management
- Exploratory Data Analysis
- Statistical Analysis
Data Collection and Preparation
Organization and Import
- Dataset structure: .zip files (one per year), containing multiple Excel files
- Focus: Excel files relevant to facility general information, ratings, and patient survey results
- Import result: Two separate parquet files at the facility level
- Patient survey responses
- Facility ratings and measurements
Challenges and Solutions
- Inconsistent File Naming Conventions
- Issue: 2021 files named differently (e.g., patient survey data file named ‘59mq-zhts’)
- Solution: Created a list of exact file names for selection, rather than using pattern matching
- Missing Data
- Expected missing data: Survey non-responses
- Unexpected missing data: Administrative errors (e.g., missing columns in recent ICHPS raw data files)
- Solution for specific cases: Simple imputation during analysis (e.g., substituting 2018 ‘nan’ values with 2019 values at the facility level)

SOS Ballot Data
Import and Selection
- Data imported via URL for each relevant proposition year (2018, 2020, 2022)
- Selected columns containing ‘Kidney’ or ‘Dialysis’ for analysis
- Geographic column manipulation:
- Renamed columns
- Backfilled rows to address multi-level index (sub-counties under counties)
- Final output: Single ballot data parquet file
- Includes year column
- Count and sub-county vote counts for each Dialysis Requirements Initiative proposition
Challenges and Solutions
- Inconsistent naming conventions across years
- 2020 and 2022: ‘County Supervisorial’
- 2018: ‘Supervisorial District’
- Solution: Standardized naming across all years
CHHS Specialty Care Clinic Complete Data Set
Import and Alignment
- Downloaded Excel files for 2013 through 2023 (one per year)
- Main challenge: Aligning pre-2018 data with 2018-forward structure
- Process:
- Separated data into two dataframes: 2013-2017 and 2018-2023
- Used CHHS mapping dictionary to rename 2013-2017 columns
- Ensured consistent data types across both dataframes
- Merged dataframes using outer join on common columns
- Dropped rows with missing FAC_NO (facility data)
Database Management
Data Merging and Standardization
- Standardized data types and column names across all datasets
- Merged datasets:
- CMS facility rating dataset with CMS patient survey dataset
- Filtered CHHS dataset (dialysis clinics only) with merged CMS data
- Reshaped CMS and CHHS data by geographic level
- Merged geographic-level data with SOS Ballot Measures dataset
Final Output
- Two parquet files:
- Data aggregated at city level
- Data aggregated at assembly district level
Custom Relational Database System
We developed a custom Python-based relational database system to centralize our datasets and facilitate efficient data access and analysis. Key features include:
- Table and View Structure: Distinct tables for datasets with multiple views for focused data access.
- Dynamic View Creation and Merging: On-the-fly creation of custom views and combination of multiple views for complex analysis.
- Conditional Querying: User-defined conditions for precise data retrieval and filtering.
- Efficient Data Access: Quick and reliable access across the entire database.
- Code Quality: Object-oriented design with consistent naming conventions for improved maintainability and adaptability.
For detailed functionality, refer to the included database demo (Milestone_1/004_data-processing-scripts/002_clean-raw-data/database_demo.ipynb).
Analysis and Insights
This project employed a Bayesian approach to investigate the relationship between dialysis facility quality metrics and voting patterns on dialysis-related propositions in California.
To maximize the granularity of our analysis, we chose to focus on city-level data, which provided more detailed vote counts compared to assembly district level data. We encoded voting outcomes as the percentage of “Yes” votes in favor of the propositions, allowing for a nuanced examination of support for dialysis industry regulation across different localities.
Given the limitations and challenges detailed below, we were able to gain some insights into our primary research question: Is the quality of care of dialysis facilities correlated with voting in favor of or against dialysis industry regulation?
We were also able to use visual analysis of the un-modeled data to gain some insights into the geographic coverage of dialysis facilities in the state.
#| echo: false #| out.width: “100%” knitr::include_url(“../007_visualizations/2022_missing_data_heatmap.html”, height = “600px”)
Analysis Steps
Data Preparation
Before modeling and analysis, we underwent several data preparation steps. These were kept until the analysis stage in the interest of transparency. Steps taken included
- Imputing missing 2018 values using 2019 data for select variables.
- Converting datatypes.
- Calculating vote percentages in favor of regulation for each facility’s city.
- Filtering data to include years 2018, 2020, and 2022.
- Aggregating data at the facility level, summarizing vote outcomes and facility characteristics.
- Removing rows with missing values to perform a complete case analysis.
Model Construction
We constructed a Bayesian multilevel model using the brms package in R. This model allowed us to account for the hierarchical nature of our data (facilities nested within cities and counties) while examining the relationship between facility quality metrics and voting outcomes.
Posterior Predictive Checks:
We performed posterior predictive checks to assess model fit and explore relationships between key variables and voting outcomes.
Insights
Staff Rating Impact:
Our analysis revealed a negative relationship between staff ratings and the percentage of votes in favor of dialysis regulation. This suggests that areas with lower-rated dialysis facility staff were more likely to support increased regulation. The effect of staff rating varied across counties, with some counties showing stronger negative relationships than others.
Mortality Rate Influence:
We found a positive relationship between facility mortality rates and support for regulation. As mortality rates increased, the predicted vote percentage in favor of regulation also increased. This suggests that voters in areas with higher mortality rates at dialysis facilities were more likely to support increased oversight.
Patient Experience Rating:
Interestingly, our analysis showed a positive relationship between patient experience ratings and support for regulation. This counterintuitive finding suggests that even in areas where patients report better experiences, there is still support for increased regulation.
Five Star Rating and Stations per Facility:
The associations of these metrics on voting behavior were weaker compared to staff ratings, mortality rates, and patient experience ratings. The estimated effects suggested by the posterior predictive checks were clustered around zero.
Chain Organization Effects:
The posterior predictive check for chain organizations showed varying levels of support for regulation across different dialysis chains, indicating that organizational factors may play a role in shaping public opinion or voting behavior.
Facility Size Considerations:
The number of dialysis stations (a proxy for facility size) showed a slight positive relationship with voting in favor of regulation, suggesting that areas with larger facilities might be more supportive of increased oversight.
Challenges and Limitations
Our analysis faced several key constraints that warrant consideration:
Data Granularity: Facility-level quality metrics were paired with city/county-level voting data, potentially obscuring finer-grained relationships and risking ecological fallacy.
Temporal Dynamics: The model assumes immediate effects of facility metrics on voting behavior, potentially overlooking lagged effects or longer-term trends.
Confounding Factors: Unmeasured variables such as socioeconomic factors, political leanings, or media coverage may influence both facility quality and voting patterns.
Causal Interpretation: While our model reveals associations, causal relationships cannot be inferred without further quasi-experimental or instrumental variable approaches.
Measurement and Data Quality: Quality metrics and aggregated voting data may not perfectly capture true care quality or individual voting behavior, introducing potential measurement error.
External Validity: Findings from California may not generalize to regions with different political, demographic, or healthcare landscapes.
These limitations highlight opportunities for future research, including more granular data collection, time-series analyses, incorporation of additional variables, and replication studies in diverse contexts.
Future Work and Next Steps
Our Bayesian analysis of dialysis facility quality metrics and voting patterns on dialysis-related propositions in California has provided valuable insights. However, several avenues for future research and methodological improvements remain:
- Granular Data Collection:
- Incorporate sub-city (such as census tract) voting data to mitigate ecological fallacy risks.
- Collect more detailed patient-level data to better understand the link between personal experiences and voting behavior.
- Temporal Analysis:
- Investigate how changes in facility quality over time correlate with shifts in voting behavior.
- Additional Variables:
- Incorporate socioeconomic factors, political leanings, and media coverage data as potential confounders.
- Causal Inference:
- Explore natural experiments, such as sudden changes in facility ownership or closures, and their impact on voting behavior.